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Malaria is a global public health challenge, with drag resistance a major barrier to disease control and elimination. 
To meet the urgent need for better treatments and vaccines, a deeper knowledge of Plasmodium biology and 
malaria epidemiology is required. An improved understanding of the genomic variation of malaria parasites, espe- 
cially the most virulent Plasmodium falciparum (Pf) species, has the potential to yield new insights in these areas. 
High-throughput sequencing and genotyping is generating large amounts of genomic data across multiple parasite 
populations. The resulting abiUty to identify informative variants, particularly single-nudeotide polymorphisms 
(SNPs), will lead to the discovery of intra- and inter-population differences and thus enable the development of 
genetic barcodes for diagnostic assays and cUnical studies. Knowledge of genetic variabiUty underlying drag resis- 
tance and other differential phenotypes will also faciUtate the identification of novel mutations and contribute to 
surveillance and stratified medicine applications. The PlasmoView interactive web-browsing tool enables the re- 
search community to visuaUse genomic variation and annotation (eg, biological function) in a geographic setting. 
The first release contains over 600 000 high-quaUty SNPs in 631 Pf isolates from laboratory strains and four 
malaria-endemic regions (West Africa, East Africa, Southeast Asia and Oceania). 
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Malaria parasites cause disease in approximately 650 
mUlion people and Plasmodium falciparum (Pf) in 
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particular kiUs up to 1 million people each year [1]. Anti- 
malarial drug resistance is a major public health problem 
that hinders disease control and elimination efforts [2].Pf 
parasites from almost all malaria-endemic countries show 
modest levels of drug resistance, especially to chloroquine 
[3]. Recent evidence indicates that P/parasites in Cambo- 
dia and Thailand are developing resistance to artemisinin 
- currently the most effective anti-malaria intervention 
[4-7]. An improved understanding of Pf genetics has pro- 
vided new insights into the molecular mechanisms of 
drug resistance [8-10] and may ultimately lead to new 
treatments and reduce the global disease burden [11]. 
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High-throughput sequencing technologies and large-scale 
genotyping chips are generating genome-wide Pf data on an 
unprecedented scale. This means that it is now possible to 
densely map genomic variation (eg, single-nucleotide polymor- 
phisms, SNPs) and assess global diversity. Knowledge of the di- 
versity of variants across populations wOl enable the biological 
interrogation of novel mutations and identification of candidate 
vaccines. Other potential applications include SNP-based 
assays to barcode parasites over time and space for epidemio- 
logical, diagnostic and clinical studies. However there are 
several roadblocks to the translation of genomic variation into 
useful public health tools. These include difficulties in obtain- 
ing robust phenotypic data (eg, drug resistance and mosquito 
transmission) on samples and the lack of web-based informat- 
ics tools to access robust genomic data, which are needed to 
launch further experiments and translational activities. 

WhOst the plasmodDB [12] and genedb [13] web-browsers 
provide genomic annotation and support the investigation of 
SNPs in individual parasite strains, there is a need for addition- 
al information (eg, allele frequencies and population statistics) 
across the growing collection of publicly available sequences 
and genotypes from clinical P/ isolates. We are making avail- 
able to the malaria community the PlasmoView web-browser 
(http;//pathogenseq.lshtm.ac.uk/plasmoview) to facilitate the 
investigation of genome-wide polymorphisms in parasites from 
different malaria-endemic regions. 

RESULTS 

SNP Data 

We aligned raw sequence and genotyping data (to the 3D7 
reference genome, version 3, 80.9% AT, 23 Mb, 14 chromo- 
somes) from 971 Pf samples from four regions: West Africa 
(WAF, Burkina Faso, Gambia, Ghana, Mali and Senegal), 
East Africa (EAF, Kenya and Malawi), Southeast Asia (SEA, 
Cambodia, Thailand and Vietnam) and Oceania (OCE, 
Papua New Guinea, Supplementary Table SI). After quality 
control procedures, 631 samples (8 laboratory/reference, 367 
WAF, 88 EAF, 151 SEA and 17 OCE) were retained. The 
alignments and high coverage (>10-fold) enabled the identifi- 
cation of 593 579 (2.6% of aU nucleotides) high-quality SNPs 
across the nuclear genome (51% exonic, 33% non-synony- 
mous. Supplementary Figure lA). These SNPs include the 
86 158 SNPs identified in 227 samples [14]. Seventy-three 
per cent (434 290) of SNPs are rare (minor allele frequency, 
MAF<1%) and 4% (29 036) are common (MAE > 10%, see 
Supplementary Figure 15). 

The linkage disequilibrium between non-rare markers 
reveals decay with physical genetic distance (Supplementary 
Figure IC). This decay is higher in the African regions (WAF 
and EAF) than in SEA or OCE. As expected, a principal com- 
ponent analysis reveals that the SNP data differentiate the 



samples by geographic region (Supplementary Figure ID) and 
population (not shown). To identify the polymorphisms 
driving the population differentiation, we applied a SNP-wise 
FsT approach where values between 0 and 1, higher values 
imply more differentiation [15]. The application across regions 
identifies 467 SNPs with Fst>OA indicating a high level of 
regional differentiation. As expected, the Fst values of intra- 
region comparisons are substantially lower (% SNPs with Fst- 
>0.1: WAF 0.2%, EAF 1.6%, SEA 6.6% and OCE n/a) than 
inter-region comparisons (% SNPs with Fsr> 0.1: 27.9%) in 
common SNPs (MAF > 15%). 

PlasmoView 

The web-based PlasmoView tool presents genome- wide varia- 
tion and geographical information on Pf. The implementation 
contains a first release of over 370 million data points (approxi- 
mately 600 k SNPs in over 600 samples described above). Plas- 
moView provides real-time visualisation and summary statistics 
and is a timely tool for the high-level interrogation of large 
genomic datasets. Pf data are presented in two views: (a) the 
matrix view provides a colour-coded indication of mutation lo- 
cations ordered by genetic and geographical location; and (b) 
the map view shows a global view of SNP prevalence by 
country. The matrix view includes graphical and real-time 
textual SNP-by-SNP genomic annotation (reference alleles, 
subtelomeric regions, gene regions, amino acid changes and 
genomic uniqueness) as well as MAF and Fst graphs plotted 
for the selected data. Both matrix and map views are interactive 
enabling researchers to easily navigate the data via the menu at 
the top of the application or using mouse buttons. The Plasmo- 
View launch page includes a facility to search by common gene 
name (eg, RH5 or CRT), previous id (PFDlMSc or MAL7P1.27) 
and the latest gene nomenclature {PF3D7_0424100 or PF3D7_ 
0709000). The tool is scalable for the significant increase in 
data that is anticipated over the next few years. In the next sec- 
tions we demonstrate the utility of PlasmoView using specific 
loci in the Pf genome. 

Drug resistance in Pf is commonly due to SNPs in genes in- 
volved in the biological pathways that antimalarials target. The 
matrix view in PlasmoView is ideally suited to display all ob- 
served mutations on a gene-by-gene basis. Figure 1 shows the 
mutations for each sample in PfCRT {PF3D7_0709000), a gene 
on chromosome 7 associated with chloroquine resistance [16]. 
Sixty-one SNPs (18 non-synonymous) have been detected in 
this gene, 5 of which are known to be involved in drug resis- 
tance. The high regional differentiation is measured by the 
SNP-wise Fst (maximum 0.63, see blue histogram track in 
Figure 1). In PlasmoView the geographic spread of any muta- 
tion is further visualised in the map view, which displays SNP 
frequencies by geographical location. For example. Figure 2 
shows the frequency of mutations observed across samples 
from each country at chromosome 7, position 403 615 (the 
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Chromosome Pf3D7_07_v3 (SNPs 37810. Length 1501717); Gene PF3D7_0709000: Range 403222 to 406317 (SNPs 61, Length 3096) 
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Figure 1. Global mutations in tine Pf chloroquine resistance transporter gene (PfCRT). The 61 SNPs (18 non-synonymous) identified in FfCfff (Chromo- 
some 7, PF3D7_0709000\ are shown. Some markers (eg, Q271E) show evidence that they have gone to fixation in SEA (blue on the left axis) and are 
common throughout the rest of the world (WAF: green; EAF; yellow; OCE; purple on the left axis). Six known drug resistance markers are shown, as well as 
3 common SNPs located in intronic regions (IT). 
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Figure 2. Global distribution of the resistance-conferring PfC/?r mutation K7BT. The global prevalence of the K76T mutation (Chromosome 7, position 
403 615) can be seen with the mutation fixed or close to fixed in most countries. Parasites may have reverted back to the wildtype allele, due to the com- 
plete withdrawal of chloroquine from some countries (eg, in East Africa, yellow on the left axis). The LAB Pf strains (laboratory-adapted and imported Pf 
strains) are located over the South Atlantic. Information on each bar chart is available by holding the cursor over it. 
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PfCRT K76T amino acid mutation, [17, 18]). PlasmoView 
enables the easy identification of mutation sites within each 
gene, estimate their allele frequencies and investigate their geo- 
graphical context, via the matrix view, the AF and Fst graphs 
and the map view, respectively. The matrix views for three 
other genes (PpHFR, PpHPS and PfMDRl) involved in drug 
resistance are presented in Supplementary Figures 2A-C and 
also provided as part of the set of pre-loaded examples of inter- 
est for the web-browser. 

Whilst a number of SNPs within drug resistance loci have 
been well characterised, other more recent and important 
regions may require investigation. For example, a major region 
of chromosome 13 (1.73 Mb to 1.82 Mb) in the P/ genome was 
identified to be undergoing a selective sweep due to artemisinin 
resistance in Southeast Asia [8]. It is possible to visualise the 
SNP variations in this region for all 63 1 global parasite isolates 
and laboratory-adapted parasite strains. The SNP density is 
very high in this region (1355 SNPs in 50 kb, 1.77 Mb to 1.82 
Mb, Figure 3), so we use the SNP-by-SNP information to direct 
further investigation. There are interesting SNPs progressing to 
fixation (high MAF and low Fst) and those showing large geo- 
graphical differences (high Fst)- For example, the SNP at posi- 
tion 1 793 121 in Pf3D7_1344700 (Supplementary Figures 2D 
and E, MAF 17%, Fst 0.38) has been identified as having the 
highest association with parasite clearance rate [[8], denoted 
SNP 1]. PlasmoView shows 46 SNPs in this region whose Fst is 
greater than 0.1 including 14 non-synonymous SNPs and one 
synonymous m PF3D7_1343800 (22 785 bp, 591 SNPs) and 
four non- synonymous SNPs and one synonymous SNP in 
PF3D7_1344300 (4768 bp, 132 SNPs). 

The lack of an effective licensed vaccine remains one of the 
most significant gaps in the arsenal to control and eliminate Pf 
malaria [19]. The P/ Reticulocyte Binding Protein Homologue 
5 (encoded by the PfRHS gene, PF3D7_0424100) is considered 
a promising candidate antigen, as it seems to be essential for 
the invasion of multiple laboratory-adapted P/lines and clinical 
P/ isolates into red blood cells [19]. High genetic variation in 
target genes can pose difficulties in the design of a vaccine that 
covers the full range of diversity across populations. The diver- 
sity observed for PfRHS across 227 clinical Pf isolates was low, 
with only 5 non-synonymous SNPs (out of 12) reaching a fre- 
quency greater than 10% in at least 1 population [19]. Using 
PlasmoView to analyze more than double the number of 
samples and 5 additional populations, we identify 22 non- 
synonymous SNPs (out of 28) and confirm the locus has low 
diversity (Figure 4) with only 5 SNPs exhibiting a frequency 
greater than 10%. 

Detecting balancing selection is one method to identify sig- 
natures of acquired immunity and therefore potential targets 
for vaccines. As immunity to the commonest alleles rises in 
malaria-endemic areas, parasites expressing rarer alleles experi- 
ence a selective advantage. This process maintains a balance of 



alleles in the population, with neither the common alleles 
moving to fixation nor the rare alleles moving to extinction. 
When multiple alleles are maintained within populations and 
none of them achieves fixation, balancing selection forces are 
believed to be present. In PlasmoView, loci under balancing se- 
lection are readily be visualised as SNPs with intermediate 
MAF (10% to 40%) and low Fst (indicating little population 
differentiation). These signatures are shown in vaccine targets 
previously identified using methods to detect balancing selec- 
tion, including the MSP3.8 (merozoite surface protein 3.8, 
PF3D7_1036300, Supplementary Figure IF) and AMAl (apical 
membrane antigen 1, PF3D7_1 133400) genes [11, 20, 21]. The 
malaria vaccine FMP2.1/AS02a is a recombinant protein 
(FMP2.1) based on AMAl and has been tested in clinical trials 
[22], see Figure 5. 

The P/ mitochondrial genome {Pf_M76611, 6 kb, 3 genes 
{Cox3, Coxl, CytB), GC content 31.6%) is uniparentally inherit- 
ed and does not undergo recombination. Therefore mitochon- 
drial DNA (mt) sequence divergence and variation data have 
been used to study the evolutionary history and migration of Pf 
[23-25], particularly gene flow out of Africa. Analyses of the 
global patterns of mf sequence variation have revealed geograph- 
ic differentiation [24-26]. PlasmoView shows 85 SNPs, including 
the three SNPs with MAF greater than 5% and Fst in excess of 
0.05 {mt772, mtl692 and mtl776, see Figure 6), which have pre- 
viously been utilised in diversity studies [23, 24, 26]. We do not 
observe a mutation at amino acid position 268 in CytB (cyto- 
chrome B, cob, MAL_MITO_3:4293) that had been correlated to 
atovaquone resistance following treatment failure [25]. 

DISCUSSION 

The continued public health burden of malaria and emergence 
of drug resistance requires the development of new treatments, 
vaccines and control measures. These efforts are likely to 
benefit from a deeper understanding of P/ biology and malaria 
epidemiology, in part deriving from analysis of parasite genetic 
variation. New genomic approaches, using massively parallelis- 
able sequencing and genotyping technologies, are generating 
vast amounts of P/ genetic data. However, to fully achieve their 
scientific potential, including the initiation of further experi- 
ments and translational activities, web-based informatics tools 
are needed to help researchers to access this genomic informa- 
tion. The PlasmoView web-browser condenses and summarises 
SNP information derived from these technologies into a shared 
and interpretable visual form. The first release contains nearly 
600 k SNPs in over 600 samples that have gone through the 
same high-quality control procedures as previous studies [14, 
27, 28], with additional validation using clonal samples that un- 
derwent both sequencing and genotyping. Genome annotation 
and measures of quality control (eg, uniqueness) incorporated 
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Figure 3. Pf chromosome 13, 1770 kb to 1820 kb. Cheeseman et al [8] described this region as being potentially important to the evolution of artemisi- 
nin-resistant Ff parasites in Southeast Asia. Of the 1355 SNPs in this 50-kb region, more than 20 exhibit high regional specificity (predominantly in SEA 
with high fs7-(>0.1) and low MAP (<30%)) and at least 10 may have spread or are currently spreading to other regions (high MAP and low fsr)- Gene 
Pf3D7_1 344700 {see Supplementary Figure ID) contains an example of both: a SNP with high regional specificity (position 1 793 121, labelled as SNP 1 in 
[8], Supplementary Figure 2E] and one moving towards global fixation ( position 1 794 574). 




Figure 4. Ff/?f/5 vaccine candidate in chromosome 4. The Ff Reticulocyte Binding Protein Homologue 5 [PfRHS, PF3D7_0424WO) gene is considered a 
promising vaccine candidate, as it seems to be essential for blood-stage parasite invasion of red blood cells [20]. The locus contains 28 SNPs, including 5 
non-synonymous mutations with MAP>10% (141 DM, S197Y, H148D, Y147H [19] and K419N [20]). 
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Figure 5. 4/VM 7 vaccine candidate in cinromosome 1 1. The apical membrane antigen 1 gene {AMAl, Ciiromosome 11, PF3D7_n 334001 has long been 
recognised as a vaccine candidate and is currently being evaluated in clinical trials [22]. The high number of SNPs with intermediate MAP and low fs^indi- 
cate that this locus is under balancing selection. 
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Figure 6. Mitochondrial genome. The Pf mitochondrion is a small, uniparentally inherited organelle and its DNA (6 kb) is used for investigating Pf evolu- 
tion and migration [23, 24, 26]. Six SNPs exhibit population differentiation (Fs7->0.05; mt74, mt772, mtlB92, mtl77B, mt2383 and mt2B41) including three 
common alleles (MAP> 5%; mt772, mtl 692 and mtl77B] used to support an African origin for the species [24, 26]. 
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into the tool allow a further level of assessment of variant 
quality. The ultimate validation is likely to come from indepen- 
dent studies using whole-genome sequencing technologies. 
When these data are placed into the public domain with appro- 
priate meta-data (eg, location of samples) then these raw se- 
quence data wOl be processed using our proven pipeline and 
included in PlasmoView. 

The case studies above demonstrate the utOity of PlasmoView 
to support informed decision-making processes, whether they 
take place in the clinic, laboratory or public policy arena. The 
genomic variation identified in drug resistance (eg, PfCRT, 
PfDHFR and PfMDRl) and vaccine candidate (eg, PfRHS, 
MSP3.8 and AMAl) genes will help to define the potential rep- 
ertoire of polymorphisms for follow-up experiments. Similarly, 
the identification of polymorphisms that drive genetic differen- 
tiation at the continental, regional and village level (eg, in drug 
resistance and surfin genes) will facOitate the barcoding of para- 
sites for use in surveOlance applications. PlasmoView has the 
functionality to display the variation with annotations, geo- 
graphical distribution and frequency data and population dif- 
ferentiation metrics in real time. 

Whilst PlasmoView is presented as a tool to visualise and sum- 
marise Pf variation, it will be extended to other Plasmodium 
species as the data becomes available. The next phase of the work 
is to characterise and display variation other than SNPs, leverag- 
ing off the high-coverage, paired-end nature of sequence data 
[29] and results from genotyping approaches (eg, comparative 
genomic hybridization [30]). Functionality will expand to include 
additional population genetic statistics (eg, Tajima's D [31]). 

In summary, PlasmoView is a powerful, scalable tool for the 
interactive geographic visualisation of Pf mutations. Using a 
high-quality set of polymorphisms, this study shows that Plas- 
moView is useful in confirming existing results, identifying po- 
tential avenues for further research and presenting complex 
genetic data to a broad audience. 

Supplementary Data 

Supplementary materials are available at The Journal of Infectious Diseases 
online (http://jid.oxfordjournals.org/). Supplementary materials consist of 
data provided by the author that are pubUshed to benefit the reader. The 
posted materials are not copyedited. The contents of all supplementary data 
are the sole responsibility of the authors. Questions or messages regarding 
errors should be addressed to the author. 
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